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The National Board for Professional Teaching Standards (NBPTS) 
establishes standards for accomplished teachers and awards profes- 
sional certification to teachers who can demonstrate that their teach- 
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The What Works Clearinghouse (WWC) identified five studies of 
NBPTS certification that both fall within the scope of the Teacher Training, Evaluation, and Compensation topic 
area and meet WWC group design standards. No studies meet WWC group design standards without reservations, 
and five studies meet WWC group design standards with reservations. Together, these studies included more than 
1,316,146 elementary and middle school students in grades 3 to 8 in four states.* 


According to the WWC review, the extent of evidence for teachers who obtained NBPTS certification on the aca- 
demic achievement of elementary and middle school students was medium to large for two student outcome 
domains—English language arts achievement and mathematics achievement. No studies meet WWC group design 
standards in the four other student outcome domains or the 11 teacher outcome domains, so this intervention 
report does not report on the effectiveness of NBPTS-certified teachers for those domains.® (See the Effectiveness 
Summary on p. 6 for more details of effectiveness by domain.) 


Effectiveness 


NBPTS-certified teachers had mixed effects on mathematics achievement and no discernible effects on English 
language arts achievement for students in grades 3 through 8. 
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Table 1. Summary of findings® 


Improvement index (percentile points) 


Number of Number of Extent of 
Outcome domain Rating of effectiveness Average Range studies students evidence 
Mathematics Mixed effects +1 0 to +2 3 1,316,146 Medium to large 
achievement 
English language No discernible effects +2 0 to +4 4 1,242,454 Medium to large 
arts achievement 
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Intervention Information 
Background 


The NBPTS was founded in 1987. The organization continues to update the standards and award certifications. 
Address: 1525 Wilson Blvd., Ste. 700, Arlington, VA 22209. Web: http://www.nbpts.org/. Telephone: (703) 465-2700. 


Intervention details 


The NBPTS offers certificates in 16 content areas for teachers working in pre-K through grade 12. For many of the 
content areas, certificates are available for students in different age groups. In general, to be eligible for certifica- 
tion, a teacher must hold a bachelor’s degree and a valid state teaching license, and must have completed 3 years 
of teaching. Requirements vary for teachers pursuing the Career and Technical Education, School Counseling, and 
World Language certifications. 


The certification process includes tasks associated with each of four components: (1) content knowledge, (2) 
differentiation in instruction, (3) teaching practice and learning environment, and (4) effective and reflective prac- 
titioner. Candidates receive an assessment score for each component. To achieve certification, candidates must 
achieve or exceed the minimum individual scores for each component and a minimum combined score across the 
four components. Candidates select the components they choose to attempt in a given year, must complete a first 
attempt at all components within 3 years, and have up to 5 years to achieve the required minimum scores for all 
components. Those who do not attain the minimum score(s) can retake components up to two times within that 
time frame. 


The first component, content knowledge, is assessed through a computer-administered test consisting of three 
constructed response exercises and 45 multiple-choice items, specific to each certification area. The content 
knowledge assessment takes a minimum of 2.5 hours to complete. The differentiation in instruction component is 
assessed via a written reflection on students’ work and includes a collection of students’ work and a commentary 
connecting the teacher’s instructional choices to students’ growth. The teaching practice and learning environment 
component is assessed via a written self-reflective analysis of teaching practice. Scores for this component are 
based on video recordings of teachers’ interactions with their students and the teachers’ written analyses of those 
interactions. To demonstrate the effective and reflective practitioner component, candidate teachers must docu- 
ment their knowledge and use of assessment and their collaboration with families and colleagues, and they must 
comment on how those activities affected students’ learning. 


Teachers who obtained NBPTS certification before 2017 must fulfill certain requirements to renew their certification 
every 10 years. This process requires demonstrating professional growth through recordings of teaching and stu- 
dents’ work, as well as a written analysis of teaching practices and plans for continued professional growth. Those 
certified in 2017 and after will be required to maintain their certification every 5 years. 


Cost 


As of April 2017, NBPTS certification candidates pay a $75 registration fee and $475 for each of the four compo- 
nents of certification; thus, the total minimum cost for certification is $1,975. Additional fees apply for candidates 
who have to repeat requirements to complete a component or change a certification area during the application 
process. For teachers certified before 2017, the fee for certification renewal is $1,250. Some states and localities 
provide subsidies to cover part of the cost of certification. Many states and school districts offer salary increases or 
bonuses for teachers who become certified through the NBPTS. 
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Research Summary 


The WWC identified 39 eligible studies that investigated the effects of Table 2. Scope of reviewed research 
NBPTS-certified teachers on academic achievement for elementary 


Grades cae 
and middle school students. An additional 109 studies were identified a hod Waoeel 
but do not meet WWC eligibility criteria (see the Glossary of Terms in ses As saa 
Intervention type Teacher level 


this document for a definition of this term and other commonly used 
research terms) for review in this topic area. Citations for all 148 studies 
are in the References section, which begins on p. 9. 


The WWC reviewed 38 eligible studies against group design standards. No studies are randomized controlled trials 
that meet WWC group design standards without reservations, and five studies use quasi-experimental designs that 
meet WWC group design standards with reservations. This report summarizes those five studies. The remaining 33 
studies do not meet WWC group design standards. 


The WWC reviewed one eligible study against pilot regression discontinuity design standards. This study does not 
meet WWC pilot regression discontinuity design standards. 


Summary of studies meeting WWC group design standards without reservations 
No studies of the effects of NBPTS-certified teachers meet WWC group design standards without reservations. 


Summary of studies meeting WWC group design standards with reservations 


Cowan and Goldhaber (2016) examined the effectiveness of NBPTS-certified teachers compared with other teachers 
in their schools using a quasi-experimental design in elementary and middle schools in Washington state. The authors 
compared the academic achievement of students receiving instruction from an NBPTS-certified teacher with those 
receiving instruction from a non-NBPTS-certified teacher. The authors measured mathematics and English language 
arts achievement using state-required end-of-year standardized tests. The analytic sample (that is, the sample used 
for study analysis) included 1,312,657 students (110,634 taught by NBPTS-certified teachers and 1,202,023 taught 

by comparison group teachers) for the mathematics achievement domain and 1,234,924 students (113,129 taught by 
NBPTS-certified teachers and 1,121,795 taught by comparison group teachers) for the English language arts achieve- 
ment domain in grades 4-8, from the 2005-06 to 2012-13 school years. Because the authors examined achievement 
across multiple school years, the reported sample sizes may count some individual students more than once. Cowan 
and Goldhaber (2016) also reported subgroup findings for school level, certification subject area, English learners, 
students receiving special education, students eligible for free or reduced-price lunch, and schools with low prior 
achievement. In addition, they reported subgroup findings for what they described as “apparently random samples” 
of these same groups of students, in which there was no evidence of students being sorted into particular classrooms 
based on demographic characteristics. Appendix D reports these supplemental findings, which do not factor into the 
intervention’s rating of effectiveness. 


Fisher and Dickenson (2005) examined the effectiveness of NBPTS-certified teachers compared with other teachers 
using a quasi-experimental design in elementary and middle schools across South Carolina. The authors compared 
the academic achievement of students receiving instruction from an NBPTS-certified teacher with those receiving 
instruction from a non—-NBPTS-certified teacher. The authors measured mathematics and English language arts 
achievement using state-required end-of-year standardized tests. Depending on the grade taught, NBPTS-certified 
teachers had an average of between 13.7 to 17.8 years of experience, whereas comparison group teachers had an 
average of between 10.4 to 14.1 years of experience. The analytic sample included 3,336 students (1,668 taught 

by NBPTS-certified teachers and 1,668 taught by comparison group teachers) for the mathematics achievement 
domain and 3,938 students (1,969 taught by NBPTS-certified teachers and 1,969 taught by comparison group 


National Board for Professional Teaching Standards Certification February 2018 Page 4 


WWC Intervention Report 


teachers) for the English language arts achievement domain in grades 4-8, during the 2003-04 school year. Fisher 
and Dickenson (2005) also reported subgroup findings for individual grades and by free or reduced-price lunch 
eligibility status. Appendix D reports these supplemental findings, which do not factor into the intervention’s rating 
of effectiveness. 


Gardner (2010) examined the effectiveness of NBPTS-certified teachers compared with other teachers using a 
quasi-experimental design in nine elementary schools in Brevard County and Seminole County Public School dis- 
tricts in Florida. The author compared the academic achievement of students receiving instruction from an NBPTS- 
certified teacher with those receiving instruction from a non-NBPTS-certified teacher. The author measured English 
language arts achievement using the Scholastic Reading Inventory standardized test. The analytic sample included 
3,592 students (635 taught by NBPTS-certified teachers with a graduate degree and 3,057 taught by comparison 
group teachers with a graduate degree) in grade 5, during the 2008-09 school year. 


Silver (2007) examined the effectiveness of NBPTS-certified teachers compared with other teachers using a quasi- 
experimental design in elementary schools in North Carolina. The author compared the academic achievement of 
students receiving instruction from an NBPTS-certified teacher with those receiving instruction from a non—NBPTS- 
certified teacher. The author measured English language arts achievement using state-required end-of-grade 
assessments. The analytic sample included 62 teachers (31 NBPTS-certified teachers and 31 comparison group 
teachers) in grades 3, 4, and 5 during the 2002-03 through 2004-05 school years.’ 


Stephens (2003) examined the effectiveness of NBPTS-certified teachers compared with other teachers using a 
quasi-experimental design in elementary schools in two large school districts in South Carolina. The author com- 
pared the academic achievement of students receiving instruction from an NBPTS-certified teacher with those 
receiving instruction from a non-NBPTS-certified teacher. The author measured mathematics achievement using 
state-required end-of-year standardized tests. The analytic sample included 153 students (72 taught by NBPTS- 
certified teachers and 81 taught by comparison group teachers) in grade 4, during the 2001-02 school year. 
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Effectiveness Summary 


The WWC review of studies of teachers obtaining NBPTS certification for the Teacher Training, Evaluation, and 
Compensation topic area includes both student and teacher outcomes. The review covers six domains for student 
outcomes and 11 domains for teacher outcomes. The five studies of NBPTS-certified teachers that met WWC 
group design standards reported findings in two of the six domains for student outcomes: (1) mathematics achieve- 
ment and (2) English language arts achievement. The studies did not report any findings that met WWC group 
design standards in the 11 domains for teacher outcomes. The following findings present the authors’ estimates 
and WWC-calculated estimates of the size and statistical significance of the effects of NBPTS-certified teachers on 
students in grades 3-8. Additional comparisons are available as supplemental findings in Appendix D. The supple- 
mental findings do not factor into the intervention’s rating of effectiveness. For a more detailed description of the 
rating of effectiveness and extent of evidence criteria, see the WWC Rating Criteria on p. 42. 


Summary of effectiveness for the mathematics achievement domain 


Table 3. Rating of effectiveness and extent of evidence for the mathematics achievement domain 


Rating of effectiveness Criteria met 


Mixed effects In the three studies that reported findings, the estimated impact of the intervention on outcomes in the mathemat- 
Evidence of inconsistent effects. ics achievement domain was positive and statistically significant in one study, and neither statistically significant 
nor large enough to be substantively important in the other two studies. 


Extent of evidence Criteria met 


Medium to large Three studies that included 1,316,146? students reported evidence of effectiveness in the mathematics achieve- 
ment domain.” 
The reported sample sizes may count some individual students more than once because some studies examined data from multiple school years. 
> Stephens (2003) included 12 schools. Cowan and Goldhaber (2016) and Fisher and Dickenson (2005) did not report the number of schools included in their studies. 


Three studies that meet WWC group design standards with reservations reported findings in the mathematics 
achievement domain. 


Cowan and Goldhaber (2016) examined one outcome in the mathematics achievement domain: the authors created 
a standardized achievement measure (called a z-score) based on two state standardized assessments measured 

in different school years (before 2010, the Washington Assessment of Student Learning; thereafter, the Measures 

of Student Progress). The authors found, and the WWC confirmed, a positive and statistically significant effect of 
NBPTS-certified teachers on mathematics achievement. The WWC characterizes this study finding as a statistically 
significant positive effect. Supplemental findings presented in Appendix D do not factor into the intervention’s rating 
of effectiveness. 


Fisher and Dickenson (2005) examined one outcome in this domain: the Palmetto Achievement Challenge Test. The 
authors did not find a statistically significant effect of teachers with NBPTS certification on mathematics achieve- 
ment. The WWC-calculated average effect size was not large enough to be considered substantively important. The 
WWC characterizes this study finding as an indeterminate effect. Supplemental findings presented in Appendix D 
do not factor into the intervention’s rating of effectiveness. 


Stephens (2003) examined one outcome in mathematics achievement: the Palmetto Achievement Challenge 

Test. The author did not find a statistically significant effect of teachers with NBPTS certification on mathematics 
achievement. The WWC-calculated average effect size was not large enough to be considered substantively impor- 
tant. The WWC characterizes this study finding as an indeterminate effect. 
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Thus, for the mathematics achievement domain, one study showed a statistically significant positive effect and two 
studies showed indeterminate effects. This results in a rating of mixed effects, with a medium to large extent of 
evidence. 


Summary of effectiveness for the English language arts achievement domain 


Table 4, Rating of effectiveness and extent of evidence for the English language arts achievement domain 


Rating of effectiveness Criteria met 


No discernible effects In the four studies that reported findings, the estimated impact of the intervention on outcomes in the English 
No affirmative evidence of effects. language arts achievement domain was neither statistically significant nor large enough to be substantively 
important. 
Extent of evidence Criteria met 
Medium to large Four studies that included 1,242,516? students reported evidence of effectiveness in the English language arts 


achievement domain.? 


@ The reported sample sizes may count some individual students more than once because some studies examined data from multiple school years. 


> Gardner (2010) included all elementary schools in Brevard County and nine elementary schools in Seminole County. Cowan and Goldhaber (2016), Fisher and Dickenson (2005), and 
Silver (2007) did not report the number of schools included in their studies. 


Four studies that met WWC group design standards with reservations reported findings in the English language arts 
achievement domain. 


Cowan and Goldhaber (2016) examined one outcome in the English language arts achievement domain: the 
authors combined two state-standardized assessments measured in different school years (before 2010, the 
Washington Assessment of Student Learning; thereafter, the Measures of Student Progress). The authors did not 
find a statistically significant effect of NBPTS-certified teachers on English language arts achievement. The WWC- 
calculated average effect size was not large enough to be considered substantively important. The WWC charac- 
terizes this study finding as an indeterminate effect. Supplemental findings presented in Appendix D do not factor 
into the intervention’s rating of effectiveness. As part of these supplemental findings, Cowan and Goldhaber (2016) 
found, and the WWC confirmed, seven statistically significant positive effects of NBPTS-certified teachers on Eng- 
lish language arts achievement for the following student subgroups: (1) students in elementary school classrooms; 
(2) students eligible for free or reduced-price lunch in elementary school classrooms; (3) students receiving special 
education in elementary school classrooms; (4) students in middle school classrooms; (5) students in middle school 
classrooms (analyzed with cohort-by-track fixed effects); (6) students of teachers with Early Adolescence: English 
Language Arts (EA/ELA) certifications in middle school classrooms; and (7) students of teachers with EA/ELA certi- 
fications in middle school classrooms (analyzed with cohort-by-track fixed effects). 


Fisher and Dickenson (2005) examined one outcome in this domain: the Palmetto Achievement Challenge Test. The 
authors did not find a statistically significant effect of NBPTS-certified teachers on English language arts achieve- 
ment. The WWC-calculated average effect size was not large enough to be considered substantively important. The 
WWC characterizes these study findings as an indeterminate effect. Supplemental findings presented in Appendix D 
do not factor into the intervention’s rating of effectiveness. As part of these supplemental findings, Fisher and Dick- 
enson (2005) found, and the WWC confirmed, four statistically significant positive effects for the following student 
subgroups: (1) grade 4 students, (2) grade 8 students eligible for free or reduced-price lunch, (3) grade 4 students not 
eligible for free or reduced-price lunch, and (4) grade 7 students not eligible for free or reduced-price lunch. 


Gardner (2010) examined one outcome in the English language arts domain: the Scholastic Reading Inventory. The 
author did not find a statistically significant effect of NBPTS-certified teachers on English language arts achieve- 
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ment. The WWC-calculated average effect size was not large enough to be considered substantively important. The 
WWC characterizes these study findings as an indeterminate effect. 


Silver (2007) examined one outcome: the North Carolina End-of-Grade Reading assessment. The author used both 
the scale scores and the percentage of students meeting proficiency requirements for this measure. The author 
did not find a statistically significant effect of NBPTS-certified teachers on English language arts achievement. The 
WWC-calculated average effect size was not large enough to be considered substantively important. The WWC 
characterizes these study findings as an indeterminate effect. 


Thus, for the English language arts achievement domain, four studies showed indeterminate effects. This results in 
a rating of no discernible effects, with a medium to large extent of evidence. 


National Board for Professional Teaching Standards Certification February 2018 Page 8 


WWC Intervention Report 


References 


Studies that meet WWC group design standards without reservations 
None. 


Studies that meet WWC group design standards with reservations 


Cowan, J., & Goldhaber, D. (2016). National Board certification and teacher effectiveness: Evidence from Wash- 
ington state. Journal of Research on Educational Effectiveness, 9(3), 233-258. Retrieved from https://eric. 
ed.gov/?id=EJ1106512 
Additional source: 

Cowan, J., & Goldhaber, D. (2015). National Board certification and teacher effectiveness: Evidence from Washing- 
ton. Technical Report 2015-1, Center for Education Data and Research, Seattle, WA. Retrieved from https:// 
eric.ed.gov/?id=ED558082 

Fisher, S., & Dickenson, T. (2005). A study of the relationship between the National Board Certification status of 
teachers and students’ achievement: Technical report. Columbia: South Carolina Dept. of Education. 

Gardner, D. J. (2010). The effectiveness of state certified, graduate degreed, and National Board certified 
teachers as determined by student growth in reading (Doctoral dissertation). Retrieved from https://eric. 
ed.gov/?id=ED522796 

Silver, K. T. (2007). The National Board effect: Does the certification process influence student achievement? (Doc- 
toral dissertation). Available from ProQuest Dissertations and Theses database. (UMI No. 3280759) 

Stephens, A. D. (2003). The relationship between National Board certification for teachers and student achievement 
(Doctoral dissertation). Available from ProQuest Dissertations and Theses database. (UMI No. 3084814) 


Studies that do not meet WWC group design standards 


Abernathy, D. F. (2009). Affluence and influence: A study of inequities in the age of excellence (Doctoral disserta- 
tion). Available from ProQuest Dissertations and Theses database. (UMI No. 3355826) The study does not 
meet WWC group design standards because equivalence of the analytic intervention and comparison groups 
is necessary and not demonstrated. 

Ajimatanrareje, F. (2014). An examination of teacher’s certification or non-certification on students achievement 
(Doctoral dissertation). Available from ProQuest Dissertations and Theses database. (UMI No. 3578849) The 
study does not meet WWC group design standards because equivalence of the analytic intervention and 
comparison groups is necessary and not demonstrated. 

Antunez, F. (2015). The effectiveness of the National Board Certification as it relates to the Advanced Placement 
Calculus AB exam (Doctoral dissertation). Available from ProQuest Dissertations and Theses database. (UMI 
No. 10154930) The study does not meet WWC group design standards because equivalence of the analytic 
intervention and comparison groups is necessary and not demonstrated. 

Brown, A. L. (2012). The cost effectiveness of a bonus pay plan for National Board Certified teachers in high poverty 
elementary schools in an urban school district in Florida (Doctoral dissertation). Available from ProQuest Dis- 
sertations and Theses database. (UMI No. 3569611) The study does not meet WWC group design standards 
because equivalence of the analytic intervention and comparison groups is necessary and not demonstrated. 

Buecker, H. L. (2010). Quality teaching in addressing student achievement: A comparative study between National 
Board certified teachers and other teachers on the Kentucky Core Content Test results (Doctoral dissertation). 
Retrieved from https://eric.ed.gov/?id=ED527825 The study does not meet WWC group design standards 
because equivalence of the analytic intervention and comparison groups is necessary and not demonstrated. 

Cantrell, S., Fullerton, J., Kane, T. J., & Staiger, D. O. (2008). National Board Certification and teacher effectiveness: 
Evidence from a random assignment experiment (NBER Working Paper No. 14608). Cambridge, MA: National 
Bureau of Economic Research. Retrieved from https://eric.ed.gov/?id=ED503841 The study does not meet 


National Board for Professional Teaching Standards Certification February 2018 Page 9 


WWC Intervention Report 


WWC group design standards because equivalence of the analytic intervention and comparison groups is 
necessary and not demonstrated. 

Cavalluzzo, L. C. (2004). /s National Board Certification an effective signal of teacher quality? Alexandria, VA: CNA 
Corporation. Retrieved from https://eric.ed.gov/?id=ED485515 The study does not meet WWC group design 
standards because equivalence of the analytic intervention and comparison groups is necessary and not 
demonstrated. 

Childs, D. E., Jr. (2006). Elementary school National Board certified teachers and student achievement (Doctoral 
dissertation). Available from ProQuest Dissertations and Theses database. (UMI No. 3224419) The study does 
not meet WWC group design standards because equivalence of the analytic intervention and comparison 
groups is necessary and not demonstrated. 

Chingos, M. M., & Peterson, P. E. (2011). It’s easier to pick a good teacher than to train one: Familiar and new 
results on the correlates of teacher effectiveness. Economics of Education Review, 30(3), 449-465. The study 
does not meet WWC group design standards because equivalence of the analytic intervention and compari- 
son groups is necessary and not demonstrated. 

Clark, S. B. (2012). The effects of National Board Certification on student achievement (Doctoral dissertation). 
Retrieved from https://eric.ed.gov/?id=ED545934 The study does not meet WWC group design standards 
because equivalence of the analytic intervention and comparison groups is necessary and not demonstrated. 

Clotfelter, C. T., Ladd, H., & Vigdor, J. (2007). Teacher credentials and student achievement: Longitudinal analy- 
sis with student fixed effects. Economics of Education Review, 26(6), 673-682. Retrieved from https://eric. 
ed.gov/?id=EJ781075 The study does not meet WWC group design standards because equivalence of the 
analytic intervention and comparison groups is necessary and not demonstrated. 

Additional sources: 

Clotfelter, C. T., Ladd, H., & Vigdor, J. (2007). How and why do teacher credentials matter for student achievements? 
(CALDER Working Paper 2). Washington, DC: National Center for Analysis of Longitudinal Data in Education 
Research. Retrieved from https://eric.ed.gov/?id=ED509655 

Clotfelter, C. T., Ladd, H. F., & Vigdor, J. L. (2006). Teacher-student matching and the assessment of teacher effec- 
tiveness. Journal of Human Resources, 41(4), 778-820. Retrieved from https://eric.ed.gov/?id=EJ750956 

Ladd, H., Clotfelter, C., & Vigdor, J. (2007). How and why do teacher credentials matter for student achievements? 
(NBER Working Paper 12828). Cambridge, MA: National Bureau of Economic Research. Retrieved from 
https://eric.ed.gov/?id=ED501923 

Ladd, H. F., Sass, T. R., & Harris, D. N. (2007). The impact of National Board certified teachers on student 
achievement in Florida and North Carolina: A summary of the evidence prepared for the National Acad- 
emies Committee on the Evaluation of the Impact of Teacher Certification by NBPTS. Washington, DC: 
The National Academies. 

Clotfelter, C. T., Ladd, H. F., & Vigdor, J. L. (2010). Teacher credentials and student achievement in high school: A 
cross-subject analysis with student fixed effects. Journal of Human Resources, 45(3), 655-681. Retrieved from 
https://eric.ed.gov/7id=EJ889247 The study does not meet WWC group design standards because equiva- 
lence of the analytic intervention and comparison groups is necessary and not demonstrated. 

Diaz, K. A. (2013). Employing National Board certification practices with all teachers: The potential of cognitive 
coaching and mentoring (Doctoral dissertation). Retrieved from https://eric.ed.gov/?id=ED552760 The study 
does not meet WWC group design standards because the measures of effectiveness cannot be attributed 
solely to the intervention. 

Falaney, P. E. (2007). National Board for Professional Teaching Standards certification: Does it impact student learn- 
ing? (Doctoral dissertation). Available from ProQuest Dissertations and Theses database. (UMI No. 3257510) 
The study does not meet WWC group design standards because equivalence of the analytic intervention and 
comparison groups is necessary and not demonstrated. 


National Board for Professional Teaching Standards Certification February 2018 Page 10 


WWC Intervention Report 


Goldhaber, D., & Anthony, E. (2007). Can teacher quality be effectively assessed? National Board Certification as a 
signal of effective teaching. Review of Economics and Statistics, 89(1), 134-150. Retrieved from https://eric. 
ed.gov/?id=ED490921 The study does not meet WWC group design standards because equivalence of the analytic 
intervention and comparison groups is necessary and not demonstrated. 

Harris, D. N., & Sass, T. R. (2009). The effects of NBPTS-certified teachers on student achievement. Journal of Policy 
Analysis and Management, 28(1), 55-80. Retrieved from https://eric.ed.gov/?id=EJ822730 The study does not meet 
WWC group design standards because equivalence of the analytic intervention and comparison groups is neces- 
sary and not demonstrated. 

Additional source: 

Harris, D. N., & Sass, T. R. (2007). The effects of NBPTS-certified teachers on student achievement (Work- 
ing Paper 4). Washington, DC: National Center for Analysis of Longitudinal Data in Education Research 
(CALDER). Retrieved from https://eric.ed.gov/?id=ED509659 

Ladd, H. F., Sass, T. R, & Harris, D. N. (2007). The impact of National Board certified teachers on student 
achievement in Florida and North Carolina: A summary of the evidence prepared for the National Acad- 
emies Committee on the Evaluation of the Impact of Teacher Certification by NBPTS. Washington, DC: 
The National Academies. 

Helding, K., & Fraser, B. (2013). Effectiveness of National Board Certified (NBC) teachers in terms of class- 
room environment, attitudes and achievement among secondary science students. Learning Environments 
Research, 16(1), 1-21. Retrieved from https://eric.ed.gov/?id=EJ996744 The study does not meet WWC group 
design standards because equivalence of the analytic intervention and comparison groups is necessary and 
not demonstrated. 

Kitts, A. S. (2011). The relationship of student achievement and level of teacher certification: A quantitative study 
(Doctoral dissertation). Available from ProQuest Dissertations and Theses database. (UMI No. 3459673) The 
study does not meet WWC group design standards because equivalence of the analytic intervention and 
comparison groups is necessary and not demonstrated. 

Locklear, R. D. (2013). A comparative study of National Board certified teachers and non-National Board certified 
teachers on student achievement in selected rural elementary schools in North Carolina (Doctoral dissertation). 
Available from ProQuest Dissertations and Theses database. (UMI No. 3581531) The study does not meet 
WWC group design standards because equivalence of the analytic intervention and comparison groups is 
necessary and not demonstrated. 

McColskey, W., Stronge, J. H., Ward, T. J., Tucker, P. D., Howard, B., Lewis, K., & Hindman, J. L. (2005). Teacher 
effectiveness, student achievement, and National Board Certified teachers. Arlington, VA: National Board for 
Professional Teaching Standards. The study does not meet WWC group design standards because equiva- 
lence of the analytic intervention and comparison groups is necessary and not demonstrated. 

McCullough, M. T. (2011). Impact of National Board certification, advanced degree, and socio-economic status on 
the literacy achievement rate of 11th grade students in Arkansas (Doctoral dissertation). Retrieved from https:// 
eric.ed.gov/?id=ED535894 The study does not meet WWC group design standards because equivalence of 
the analytic intervention and comparison groups is necessary and not demonstrated. 

McRae, J. S. (2014). Advancing the science of hiring teachers: An analysis of the effects of teacher characteristics 
on student achievement (Doctoral dissertation). Available from ProQuest Dissertations and Theses database. 
(UMI No. 3682166) The study does not meet WWC group design standards because equivalence of the ana- 
lytic intervention and comparison groups is necessary and not demonstrated. 

Morgigno, R. C. (2012). The effects of National Board certified teachers on student achievement in Mississippi high 
schools (Doctoral dissertation). Retrieved from https://eric.ed.gov/?id=ED547197 The study does not meet 


WWC group design standards because equivalence of the analytic intervention and comparison groups is 
necessary and not demonstrated. 
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Rouse, W. A. (2004). An examination of student test results: National Board-Certified teachers and non-National 
Board-Certified teachers (Doctoral dissertation). Available from ProQuest Dissertations and Theses database. 
(UMI No. 3120274) The study does not meet WWC group design standards because equivalence of the ana- 
lytic intervention and comparison groups is necessary and not demonstrated. 
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Rouse, W., & Hollomon, H. L. (2005). A comparison of student test results: Business and marketing education 
National Board Certified teachers and non-National Board teachers. The Delta Pi Epsilon Journal, 47(3), 
128-142. Retrieved from https://eric.ed.gov/7id=EJ748223 

Rouse, W. A., Jr. (2008). National Board Certified teachers are making a difference in student achieve- 
ment: Myth or fact? Leadership and Policy in Schools, 7(1), 64-86. Retrieved from https://eric. 
ed.gov/?id=EJ81 1558 

Saderholm, J. (2007). Science inquiry learning environments created by National Board certified teachers (Doctoral 
dissertation). Available from ProQuest Dissertations and Theses database. (UMI No. 3286743) The study does 
not meet WWC group design standards because equivalence of the analytic intervention and comparison 
groups is necessary and not demonstrated. 

Sanders, W. L., Ashton, J. J., & Wright, S. P. (2005). Comparison of the effects of NBPTS certified teachers with 
other teachers on the rate of student academic progress. Final report. Arlington, VA: National Board for Profes- 
sional Teaching Standards. Retrieved from https://eric.ed.gov/?id=ED491846 The study does not meet WWC 
group design standards because equivalence of the analytic intervention and comparison groups is necessary 
and not demonstrated. 

Sato, M., Ruth, C. W., & Darling-Hammond, L. (2008). Improving teachers’ assessment practices through profes- 
sional development: The case of National Board Certification. American Educational Research Journal, 45(8), 
669-700. Retrieved from https://eric.ed.gov/?id=EJ807296 The study does not meet WWC group design 
standards because equivalence of the analytic intervention and comparison groups is necessary and not 
demonstrated. 

Smith, T. W., Appalachian State University Office for Research on Teaching. (2005). An examination of the rela- 
tionship between depth of student learning and National Board Certification status. Boone, NC: Office for 
Research on Teaching, Appalachian State University. The study does not meet WWC group design standards 
because equivalence of the analytic intervention and comparison groups is necessary and not demonstrated. 

Strobel, T. L. (2011). The effect of National Board Certification on student achievement in career and technology 
education (Doctoral dissertation). Available from ProQuest Dissertations and Theses database. (UMI No. 
3454027) The study does not meet WWC group design standards because equivalence of the analytic inter- 
vention and comparison groups is necessary and not demonstrated. 

Stronge, J. H., Ward, T. J., Tucker, P. D., Hindman, J. L., McColsky, W., & Howard, B. (2007). National Board certi- 
fied teachers and non-National Board certified teachers: Is there a difference in teacher effectiveness and 
student achievement? Journal of Personnel Evaluation in Education, 20(3-4), 185-210. Retrieved from https:// 
eric.ed.gov/?id=EJ789880 The study does not meet WWC group design standards because equivalence of 
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Vandevoort, L. G., Amrein-Beardsley, A., & Berliner, D. C. (2004). National Board certified teachers and 
their students’ achievement. Education Policy Analysis Archives, 12(46). Retrieved from https://eric. 
ed.gov/?id=EJ853513 The study does not meet WWC group design standards because equivalence of the 
analytic intervention and comparison groups is necessary and not demonstrated. 

Additional source: 
Vandevoort, L. G. (2004). National Board certified teachers and student achievement (Doctoral dissertation). 
Available from ProQuest Dissertations and Theses database. (UMI No. 3123636) 
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Vitale, T. M. (2008). What is the relationship between National Board Certification and the achievement results of third 
grade students in a local central Florida school district? (Doctoral dissertation). Available from ProQuest Disserta- 
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Welborn, T. M. (2016). Do students that have a National Board certified teacher have higher scores on standardized 
achievement tests in Mississippi? (Doctoral dissertation, Mississippi College). The study does not meet WWC 
group design standards because equivalence of the analytic intervention and comparison groups is necessary 
and not demonstrated. 


Study that does not meet WWC pilot regression-discontinuity design standards 
Goldhaber, D., & Hansen, M. (2009). National Board certification and teachers’ career paths: Does NBPTS certifi- 
cation influence how long teachers remain in the profession and where they teach? Education Finance and 
Policy, 4(3), 229-262. Retrieved from https://eric.ed.gov/?id=EJ849857 The study does not meet WWC pilot 
regression discontinuity design standards because it has high or unknown levels of attrition and does not 
demonstrate continuity of the outcome-forcing variable relationship. 


Studies that are ineligible for review using the Teacher Training, Evaluation, and Compensation Evidence Review 
Protocol 

Adams, A. (2016). Teacher leadership: A little less conversation, A little more action research (Doctoral dissertation). 
Available from ProQuest Dissertations and Theses database. (UMI No. 10107569) This study is ineligible for 
review because it is out of scope of the protocol. 

Allen, P. R. (2012). Understanding the relationship between students’ reading achievement and teachers’ self-regu- 
lation patterns in grades K-3 (Doctoral dissertation). Available from ProQuest Dissertations and Theses data- 
base. (UMI No. 3578849) This study is ineligible for review because it is out of scope of the protocol. 

Amos, J. L. (2013). Supporting teachers: The role of reflection in professional learning (Doctoral dissertation). 
Retrieved from https://eric.ed.gov/?id=ED552435 This study is ineligible for review because it is out of scope 
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ProQuest Dissertations and Theses database. (UMI No. 3211667) This study is ineligible for review because it 
does not use an eligible design. 

Angulo, S. R. (2010). Highly qualified: The perceptions of student learning and pedagogy related to mathematics of 
National Board certified teachers of urban Latino students (Doctoral dissertation). Retrieved from https://eric. 
ed.gov/?id=ED519381 This study is ineligible for review because it does not use an eligible design. 

Bailey, A. T. (2010). Leadership skills of North Carolina principals with certification from the National Board of 
Professional Teaching Standards (Doctoral dissertation). Available from ProQuest Dissertations and Theses 
database. (UMI No. 3415796) This study is ineligible for review because it does not use an eligible design. 

Balbach, A. B. M. (2012). A survey of Pennsylvania school principals’ perceptions of the National Board for Profes- 
sional Teaching Standards certification process and the leadership roles of National Board certified teachers 
(Doctoral dissertation). Retrieved from https://eric.ed.gov/?id=ED546678 This study is ineligible for review 
because it does not use an eligible design. 

Baratz-Snowden, J. (1993). Assessment of teachers: A view from the National Board for Professional Teaching 
Standards. Theory into Practice, 32(2), 82-85. Retrieved from https://eric.ed.gov/?id=EJ467924 This study is 
ineligible for review because it does not use an eligible design. 

Beck, L. D. (2009). The current state of professional development in Appalachia (Doctoral dissertation). Available 
from ProQuest Dissertations and Theses database. (UMI No. 3380502) This study is ineligible for review 
because it is out of scope of the protocol. 
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Belson, S. I., & Husted, T. A. (2015). Impact of National Board for the Professional Teaching Standards certifi- 
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Bryant, A. J. (2010). Perception of high-stakes testing by National Board certified teachers (Doctoral dissertation). 
Available from ProQuest Dissertations and Theses database. (UMI No. 3407615) This study is ineligible for 
review because it is out of scope of the protocol. 
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achievement (Doctoral dissertation). Available from ProQuest Dissertations and Theses database. (UMI No. 
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Cannata, M., McCrory, R., Sykes, G., Anagnostopoulos, D., & Frank, K. A. (2010). Exploring the influence of 
National Board certified teachers in their schools and beyond. Educational Administration Quarterly, 46(4), 
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Appendix A.1: Research details for Cowan and Goldhaber (2016) 


Cowan, J., & Goldhaber, D. (2016). National Board certification and teacher effectiveness: Evidence 
from Washington state. Journal of Research on Educational Effectiveness, 9(3), 233—258.° 


Table Al. Summary of findings Meets WWC Group Design Standards With Reservations 
Study findings 
Average improvement index 
Outcome domain Sample size (percentile points) Statistically significant 
Mathematics achievement 1,312,657 students +2 Yes 
English language arts 1,234,924 students +1 No 


achievement 


Setting = This study was conducted in elementary and middle school grades throughout Washington state. 


Study sample This study examined two groups of students: elementary school classrooms, defined as those 
in self-contained classes, primarily grades 3-5, but some sixth-grade classes; and middle 
school classrooms, defined as those in non-self-contained classes, primarily grades 7 and 
8, with some sixth-grade classes. The students in elementary school classes were examined 
between the 2005-06 and 2012-13 school years, while students in middle school classes 
were examined between the 2009-10 and 2012-13 school years. The analytic sample for 
the mathematics scores includes 110,634 students taught by NBPTS-certified teachers, 
and 1,202,023 students taught by comparison teachers. The analytic sample for the English 
language arts scores includes 113,129 students taught by NBPTS-certified teachers, and 
1,121,795 students taught by comparison teachers. Because the study spans multiple school 
years, individual students may be included more than once in the sample size counts. Demo- 
graphics are not provided for the full sample of elementary and middle school students. The 
WWC-calculated weighted average demographics between the elementary and middle school 
math samples suggest that in the analytic sample, 49% of students were female; about 63% 
were White, 17% Hispanic, 9% were Asian, 5% Black, 5% multiracial, and 2% were American 
Indian.? Among the students in the sample, about 5% had limited English proficiency, 6% had 
a learning disability, and 46% were eligible for free or reduced-price lunches. 


In addition, the authors present subgroup findings for school level (elementary school or mid- 
dle school classrooms), NBPTS-certification subject area (Middle Childhood: Generalist [MC/ 
Gen], Early/Middle Childhood: Literacy, Reading, and Language Arts [EMC/LRLA], Early Ado- 
lescence: English Language Arts [EA/ELA], and Early Adolescence: Math [EA/Math]), special 
education status, eligibility for free or reduced-price lunch, and schools with low high-poverty 
rates (Challenging Schools Bonus vs. non-Challenging Schools Bonus). The subgroup findings 
are reported in Appendix D.'° The supplemental findings do not factor into the intervention’s 
rating of effectiveness. 


Intervention The intervention consisted of regular instruction for 1 year by an NBPTS-certified teacher. 
group 
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Comparison The comparison consisted of regular instruction for 1 year by a teacher who was not NBPTS- 
group certified. 


Outcomes and __ This study examined one outcome in the mathematics achievement domain and one out- 

measurement come in the English language arts achievement domain. Both outcomes were measured using 
the same instrument in a given year, but there was a change in the instruments used during 
the study. For outcomes prior to spring 2010, student achievement was measured using the 
Washington Assessment of Student Learning test. This test was replaced with the Measure- 
ments of Student Progress assessment in spring 2010. These outcomes were standardized, 
and the analysis included cohort fixed effects. For a more detailed description of these out- 
come measures, see Appendix B. 


Support for Teachers are provided incentives to become NBPTS-certified teachers, and they are also 
implementation offered financial incentives to teach in lower performing schools. Prior to 2008, Washington 
state provided a $3,500 salary incentive for certified teachers, which increased to $5,000 in 
2008. Also starting in 2008, Washington state NBPTS-certified teachers were offered a $5,000 
incentive to teach in lower performing schools. Individual school districts may offer additional 
incentives such as financial support, release for certification activities, and mentoring. 


Appendix A.2: Research details for Fisher and Dickenson (2005) 


Fisher, S., & Dickenson, T. (2005). A study of the relationship between the National Board certification 
status of teachers and students’ achievement: Technical report. Columbia: South Carolina Dept. 
of Education. 


Table A2. Summary of findings Meets WWC Group Design Standards With Reservations 
Study findings 
Average improvement index 
Outcome domain Sample size (percentile points) Statistically significant 
Mathematics achievement 288 teachers/3,336 students +2 No 
English language arts 406 teachers/3,938 students +4 No 


achievement 


Setting This study was conducted in elementary and middle school grades throughout South Carolina. 


Study sample This study examined students in grades 4-8 using a quasi-experimental matched-comparison 
design. NBPTS-certified teachers who taught math or English language arts in grades 4-8 
were matched with non-certified teachers who had similar years of teaching experience and 
who taught in schools with similar school poverty levels and student/teacher ratios as the 
NBPTS-certified teachers. Non-certified teachers who taught in schools with an NBPTS-cer- 
tified teacher or NBPTS-applicant teacher were excluded from the comparison group as they 
may benefit from working collaboratively with certified teachers or applicants. The analytic 
sample for the mathematics scores includes 1,668 students taught by 144 NBPTS-certified 
teachers, and 1,668 students taught by 144 comparison teachers. The analytic sample for the 
English language arts scores includes 1,969 students taught by 187 NBPTS-certified teach- 
ers, and 1,969 students taught by 187 comparison teachers. Approximately 47% of students 
received free or reduced-price lunch. 
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In addition, the authors present subgroup findings by grade (4, 5, 6, 7, or 8) and by whether 
students were eligible for free or reduced-price lunch (eligible or not eligible). The subgroup 
findings are reported in Appendix D.'' The supplemental findings do not factor into the inter- 
vention’s rating of effectiveness. 


Intervention The intervention consisted of regular instruction in mathematics or English language arts for 
group 1 year by a teacher with NBPTS certification. Depending on the grade taught, NBPTS-certified 
teachers had an average of between 13.7 and 17.8 years of experience. 


Comparison The comparison consisted of regular instruction in mathematics or English language arts for 
group 1 year by a teacher who was not NBPTS-certified. Depending on the grade taught, non-certi- 
fied teachers had an average of between 10.4 and 14.1 years of experience. 


Outcomes and _ This study examined two outcomes, mathematics achievement and English language arts 
measurement achievement. Both outcomes were measured using the Palmetto Achievement Challenge Test. 
For a more detailed description of this outcome measure, see Appendix B. 


Support for NBPTS-certified teachers automatically received an equivalent of 12 credit hours toward the 
implementation renewal of their teaching certificates, additional annual pay while maintaining NBPTS certifica- 
tion, and forgiveness of any loans used to pay for the application fee. 


Appendix A.3: Research details for Gardner (2010) 


Gardner, D. J. (2010). The effectiveness of state certified, graduate degreed, and National Board certi- 
fied teachers as determined by student growth in reading (Doctoral dissertation). Available from 
ProQuest Dissertations and Theses database. (UMI No. 3415029) 


Table A3. Summary of findings Meets WWC Group Design Standards With Reservations 
Study findings 


Average improvement index 
Outcome domain Sample size (percentile points) Statistically significant 


English language arts 3,592 students 0 No 
achievement 


Setting This study took place in two public school districts in Florida; specifically, all elementary 
schools in Brevard County Public Schools and nine elementary schools in Seminole County 
Public Schools participated. 


Study sample The students included in this study were in grades 3-5 during school year 2008-09 in Florida. 
The analytic sample for the mathematics scores includes 535 students taught by NBPTS- 
certified teachers, and 3,057 students taught by comparison teachers. About 70% were White, 
12% were Black, 9% were Hispanic, 6% were of mixed race, and 3% were Asian. About 51% 
were male, less than 3% were English learners, and about 35% qualified for free or reduced- 
price lunch. 


In addition, the author presents subgroup findings by grade (3, 4, or 5) and by the highest degree 
obtained by the teacher (bachelor’s or graduate). The subgroup findings are reported in Appen- 
dix D. The supplemental findings do not factor into the intervention’s rating of effectiveness. 
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Intervention The intervention condition was receiving 1 year of instruction by a teacher with NBPTS 
group certification. 


Comparison The comparison condition was receiving 1 year of instruction from teachers without NBPTS 
group certification. 


Outcomes and This study measured English language arts achievement using the Scholastic Reading Inven- 
measurement tory. This test was administered at the beginning of the school year and again at the end of 
April. For a more detailed description of this outcome measure, see Appendix B. 


Support for The study notes that the state of Florida provides a salary bonus to teachers who achieve 
implementation NBPTS certification. No details are provided on this salary bonus system. 


Appendix A.4: Research details for Silver (2007) 


Silver, K. T. (2007). The National Board effect: Does the certification process influence student achieve- 
ment? (Doctoral dissertation). Available from ProQuest Dissertations and Theses database. (UMI 
No. 3280759) 


Table A4. Summary of findings Meets WWC Group Design Standards With Reservations 


Study findings 


Average improvement index 
Outcome domain Sample size (percentile points) Statistically significant 


English language arts 62 teachers +1 No 
achievement 


Setting This study was conducted in elementary school grades 3-5 throughout North Carolina. 


Study sample The study examined the effect of NBPTS-certified teachers in the first year after they received 
certification. The author identified 81 teachers in grades 3-5 who received NBPTS certifica- 
tion in the 2003-04 school year and matched these teachers to 81 comparison teachers 
without NBPTS certification based on teaching experience, degree level, grade level taught, 
and school district. Approximately 90% of the teachers were White, 8% were Black, 1% were 
Hispanic, and less than 1% were Native American, 95% were female, and 72% held bach- 
elor’s degrees. The analytic sample included 31 NBPTS-certified teachers and 31 comparison 
teachers without NBPTS certification. 


In addition, the author present subgroup findings by grade (3, 4, or 5). The subgroup findings 
are reported in Appendix D. The supplemental findings do not factor into the intervention’s rat- 
ing of effectiveness. 


Intervention =‘ The intervention condition was receiving 1 year of instruction during the 2004-05 school year 
group by a teacher receiving NBPTS certification in the prior school year. 
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Comparison The comparison condition was receiving 1 year of instruction during the 2004-05 school year 
group from teachers without NBPTS certification. 


Outcomes and This study measured English language arts achievement using the North Carolina End-of- 

measurement Grade reading assessment, a state-required test given to all North Carolina public school 
students in grades 3-8. The author examined the raw score obtained on this assessment, as 
well as the percent of students scoring above the threshold required to be considered profi- 
cient by North Carolina standards.'? For a more detailed description of this outcome measure, 
see Appendix B. 


Support for Teachers obtaining NBPTS certification are provided with a 12% salary supplement in North 
implementation Carolina. 


Appendix A.5: Research details for Stephens (2003) 


Stephens, A. D. (2003). The relationship between National Board certification for teachers and student 
achievement (Doctoral dissertation). Available from ProQuest Dissertations and Theses database. 
(UMI No. 3084814) 


Table A5. Summary of findings Meets WWC Group Design Standards With Reservations 
Study findings 


Average improvement index 
Outcome domain Sample size (percentile points) Statistically significant 


Mathematics achievement 22 teachers/153 students 0 No 


Setting = This study took place in elementary school grades 4 and 5 in two large school districts in 
South Carolina. One district was described as a suburban district with a total population of 
14,759 students across 36 schools. The second district contained urban, suburban, and rural 
schools with a total of 42,446 students across 85 schools. 


Study sample This study individually matched each of eight teachers with NBPTS certification to a teacher 
without certification. Four of the NBPTS-certified teachers taught students in grade 4 and four 
in grade 5. Individual teachers were matched on the prior year’s mathematics achievement 
of their current students in the instructional year, as well as within a range of the school-level 
poverty index. Intervention and comparison group teachers were chosen from within each of 
the participating school districts. The analytic sample includes 72 students taught by the four 
NBPTS-certified teachers, and 81 students taught by the four comparison teachers. The race, 
gender, and free and reduced-price lunch status of students were not reported. Across all 
matches, the poverty level ranged from 14.2 to 98.5. 


The author presented separate comparisons for each NBPTS-certified teacher. Each of these 
contrasts has a confounding factor since the intervention condition was delivered by a single 
teacher. An author query was sent to see if aggregate findings were available. The author did 
not have aggregated findings, so the WWC aggregated the four contrasts for each grade and 
used these aggregated findings as the contrasts of interest for this review. 
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Intervention The intervention condition was receiving 1 year of instruction in math during the 2001-02 school 
group year by a teacher with NBPTS certification. Each teacher had at least 3 years of experience. 


Comparison The intervention condition was receiving 1 year of instruction in math during the 2001-02 
group school year by a teacher without NBPTS certification. Each teacher had at least 3 years of 
experience. 


Outcomes and __ This study measured mathematics achievement using the Palmetto Achievement Challenge 
measurement Test, a state-required standardized assessment. For a more detailed description of this out- 
come measure, see Appendix B. 


Support for § The state of South Carolina provided a $7,500 bonus for NBPTS certification. The two partici- 
implementation pating school districts provided salary stipends and/or compensation to teachers achieving 
NBPTS certification; no details on these incentives were provided in the study. 
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Appendix B: Outcome measures for each domain 


Mathematics achievement 


Palmetto Achievement Challenge Test 


Standardized Math Test 


Fisher and Dickenson (2005) used this state assessment to measure achievement for students in grades 4—8. Scaled 
scores from the 2004 administration were used as the outcome (as cited in Fisher & Dickenson, 2005). Stephens 
(2003) also used this assessment to measure achievement for students in school years 2000—01 and 2001-02 (as 
cited in Stephens, 2003). Statewide, students in each grade obtain an average of 100 times their grade level on each 
assessment, such as 400 for grade 4 and 800 for grade 8 (Fisher & Dickenson, 2005). 


Cowan and Goldhaber (2016) created a standardized math score using the Measures of Student Progress and 
the Washington Assessment of Student Learning for students in grades 3-8. The Washington Assessment 

of Student Learning was used for school years 2006-07 through fall 2009-10. The Measures of Student 
Progress was used for the spring of school year 2009-10 and all of school year 2012-13 (as cited in Cowan & 
Goldhaber, 2016). 


English language arts achievement 


North Carolina End-of-Grade Reading 
Assessment 


Palmetto Achievement Challenge Test 


Scholastic Reading Inventory 


Standardized English Language Arts Test 


Silver (2007) used the state-required end-of-grade reading assessment in North Carolina for students in grades 
3-5. This is a multiple-choice test aligned to the North Carolina Standard Course of Study and is given to all 
public school students in North Carolina in grades 3-8. The average test-retest reliability was .86 and the 
internal consistency ranged from .90 to .94. This outcome was examined in scale score units and in the percent 
of students meeting proficiency standards for each grade (as cited in Silver, 2007). 


Fisher and Dickenson (2005) used this state assessment to measure achievement for students in grades 4—8. 
Scaled scores from the 2004 administration were used as the outcome. Statewide, students in each grade 
obtain an average of 100 times their grade level on each assessment, such as 400 for grade 4 and 800 for 
grade 8 (Fisher & Dickenson, 2005). 


Gardner (2010) measured English language arts achievement for students in grades 3—5 using the Lexile 
measure from the Scholastic Reading Inventory (SRI). The Lexile measure is nationally-normed and ranges from 
OL to 2000L and provides a metric to assess reading growth over time. The SRI is a reading comprehension 
assessment where students read brief passages and answer questions about the content. This assessment 

is taken via computer and has been externally validated for construct and criterion-related validity (as cited in 
Gardner, 2010). 


Cowan and Goldhaber (2016) created a standardized English language arts score using the Measures of Student 
Progress and the Washington Assessment of Student Learning for students in grades 3—8. The Washington 
Assessment of Student Learning was used for school years 2006-07 through fall 2009-10. The Measures of 
Student Progress was used for the spring of school year 2009-10 and all of school year 2012-13 (as cited in 


Cowan & Goldhaber, 2016). 
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Appendix C.1: Findings included in the rating for the mathematics achievement domain 


Mean 
(ENCE Mei) WWC calculations 
Intervention Comparison Mean Effect Improvement 

Outcome measure group group difference size index 

Cowan & Goldhaber (2016)? 

Standardized Math Test Elementary 15,556 0.03 —0.01 0.04 0.04 +2 < 01 

and middle teachers/ (1.02) (0.99) 
school 1,312 eau 
students students 

Domain average for mathematics achievement (Cowan & Goldhaber, 2016) 0.04 +2 Statistically 
significant 

Fisher & Dickenson (2005)° 

Palmetto Achievement Grades 288 teachers/ 0.05 0.00 0.05 0.05 +2 Al 

Challenge Test 4-8 3,336 (1.00) (1.00) 

students 

Domain average for mathematics achievement (Fisher & Dickenson, 2005) 0.05 +2 Not 
statistically 
significant 

Stephens (2003)° 

Palmetto Achievement Grade 4 8 teachers/ 421.66 421.51 0.15 0.01 0 98 

Challenge Test 153 students (13.78) (13.16) 

Domain average for mathematics achievement (Stephens, 2003) 0.01 0 Not 
statistically 
significant 


Domain average for mathematics achievement across all studies 


Table Notes: For mean difference, effect size, and improvement index values reported in the table, a positive number favors the intervention group and a negative number favors 
the comparison group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing the average change expected for all individuals who 
are given the intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate presentation of the effect size, reflecting the change 
in an average individual’s percentile rank that can be expected if the individual is given the intervention. The WWC-computed average effect size is a simple average rounded to 
two decimal places; the average improvement index is calculated from the average effect size. The statistical significance of each study’s domain average was determined by the 
WWC. Some statistics may not sum as expected due to rounding. na = not applicable. 


@ For Cowan and Goldhaber (2016), a correction for clustering was needed but did not affect whether any of the contrasts were found to be statistically significant. The effect size 
was calculated using the ordinary least-squares (OLS) coefficient. The single finding presented here is based on an aggregated sample of elementary and middle school students 
separately reported in the original study. The authors provided unadjusted baseline and post-intervention means and standard deviations for the outcome at the WWC’s request. The 
authors reported p-values for some results, but not for the aggregated analysis. The WWC applied a correction for clustering and calculated the p-value reported in the table. This 
study is characterized as having a statistically significant positive effect because the estimated effect for the one measure in this domain is positive and statistically significant. For 
more information, please refer to the WWC Procedures and Standards Handbook (version 3.0), p. 26. 


> For Fisher and Dickenson (2005), a correction for clustering was needed but did not affect whether any of the contrasts were found to be statistically significant. The effect size was 
calculated using the unadjusted mean and standard deviation calculation. The single finding presented here is based on an aggregated sample of students in grades 4—8 reported 
separately by grade in the study. Because the outcome measure was not scaled to allow direct comparisons of scores across grades, the WWC standardized the scores and removed 
between-grade variation in the outcome means prior to aggregating across grades. The authors reported p-values for some results, but not for the aggregated analysis. The WWC 
applied a correction for clustering and calculated the p-value reported in the table. This study is characterized as having an indeterminate effect because the estimated effect for the 
one measure in this domain is neither statistically significant nor substantively important. For more information, please refer to the WWC Procedures and Standards Handbook (version 
3.0), p. 26. 


° For Stephens (2003), a correction for clustering was needed but did not affect whether any of the contrasts were found to be statistically significant. The single finding presented 
here is based on an aggregated sample of grade 4 teachers and their students, which were reported separately by teacher in the original study. The effect size was calculated using 
the unadjusted mean and standard deviation calculation. The author reported p-values for some results, but not for the aggregated analysis. The WWC applied a correction for cluster- 
ing and calculated the p-value reported in the table. This study is characterized as having an indeterminate effect because the estimated effect for the one measure in this domain is 
neither statistically significant nor substantively important. For more information, please refer to the WWC Procedures and Standards Handbook (version 3.0), p. 26. 
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Appendix C.2: Findings included in the rating for the English language arts achievement domain 


Mean 
(standard deviation) WWC calculations 
Study Sample Intervention Comparison Mean Effect Improvement 
Outcome measure sample size group group difference size index 
Cowan & Goldhaber (2016)? 
Standardized English Elementary 16,081 0.03 0.02 0.01 0.02 +1 24 
Language Arts Test and middle teachers/ (0.97) (0.99) 
school 1,234,924 
students students 
Domain average for English language arts achievement (Cowan & Goldhaber, 2016) 0.02 +1 Not 
statistically 
significant 
Fisher & Dickenson (2005)* 
Palmetto Achievement Grades 374 teachers/ 0.10 0.00 0.10 0.10 +4 .07 
Challenge Test 4-8 3,938 (1.00) (1.00) 
students 
Domain average for English language arts achievement (Fisher & Dickenson, 2005) 0.10 +4 Not 
statistically 
significant 
Gardner (2010)° 
Scholastic Reading Inventory Grade 5 3,592 923.93 921.47 2.46 0.01 0 81 
students students (218.03) (221.12) 
of teachers 
witha 
bachelor’s 
degree 
Domain average for English language arts achievement (Gardner, 2010) 0.01 0 Not 
statistically 
significant 
Silver (2007)¢ 
North Carolina End-of-Grade Grade 4 62 teachers 262.9) 252.92 —0.01 —0.00 0 99 
Reading Assessment teachers (3.74) (3.98) 
Percent proficient on North Grade 4 62 teachers 84.96 84.10 0.86 0.07 +3 Al 
Carolina End-of-Grade Read- _ teachers (na) (na) 
ing Assessment 
Domain average for English language arts achievement (Silver, 2007) 0.04 +1 Not 
statistically 
significant 


Domain average for English language arts achievement across all studies 


Table Notes: For mean difference, effect size, and improvement index values reported in the table, a positive number favors the intervention group and a negative number favors 
the comparison group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing the average change expected for all individuals who 
are given the intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate presentation of the effect size, reflecting the change 
in an average individual’s percentile rank that can be expected if the individual is given the intervention. The WWC-computed average effect size is a simple average rounded to 
two decimal places; the average improvement index is calculated from the average effect size. The statistical significance of each study’s domain average was determined by the 
WWC. Some statistics may not sum as expected due to rounding. na = not applicable. 


@ For Cowan and Goldhaber (2016), a correction for clustering was needed but did not affect whether any of the contrasts were found to be statistically significant. The effect size 

was calculated using the ordinary least-squares (OLS) coefficient. The single outcome presented here is based on an aggregated sample of elementary and middle school students 
separately reported in the original study. The authors provided unadjusted baseline and post-intervention means and standard deviations for the outcome at the WWC’s request. The 
authors reported p-values for some results, but not for the aggregated analysis. The WWC applied a correction for clustering and calculated the p-value reported in the table. This 
study is characterized as having an indeterminate effect because the estimated effect for the one measure in this domain is neither statistically significant nor substantively important. 
For more information, please refer to the WWC Procedures and Standards Handbook (version 3.0), p. 26. 
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> For Fisher and Dickenson (2005), a correction for clustering was needed but did not affect whether any of the contrasts were found to be statistically significant. The effect size was 
calculated using the unadjusted mean and standard deviation calculation. The single finding presented here is based on an aggregated sample of students in grades 4-8 reported 
separately by grade in the study. Because the outcome measure was not scaled to allow direct comparisons of scores across grades, the WWC standardized the scores and removed 
between-grade variation in the outcome means prior to aggregating across grades. The authors reported p-values for some results, but not for the aggregated analysis. The WWC 
applied a correction for clustering and calculated the p-value reported in the table. This study is characterized as having an indeterminate effect because the estimated effect for the 
one measure in this domain is neither statistically significant nor substantively important. For more information, please refer to the WWC Procedures and Standards Handbook (version 
3.0), p. 26. 


© For Gardner (2010), the WWC calculated the intervention group mean using a difference-in-differences approach by adding the impact of the intervention (i.e., difference in mean 
gains between the intervention and comparison groups) to the unadjusted comparison group posttest means. Please see the WWC Procedures and Standards Handbook (version 
3.0), p. 23 for more information. The WWC did not make corrections for clustering or multiple comparisons. The p-value presented here was calculated by the WWC. The WWC was 
unable to make corrections for clustering because the number of teachers included in the study was unknown. This study is characterized as having an indeterminate effect because 
the estimated effect for the one measure in this domain is neither statistically significant nor substantively important. For more information, please refer to the WWC Procedures and 
Standards Handbook (version 3.0), p. 26. 


4 For Silver (2007), the WWC did not need to make corrections for clustering, multiple comparisons, or to adjust for baseline differences. The WWC calculated the intervention group 
mean using a difference-in-differences approach by adding the impact of the intervention (i.e., difference in mean gains between the intervention and comparison groups) to the 
unadjusted comparison group posttest means. Please see the WWC Procedures and Standards Handbook (version 3.0), p. 23 for more information. The p-values presented here were 
calculated by the WWC. This study is characterized as having an indeterminate effect because the estimated effect for the one measure in this domain is neither statistically signifi- 
cant nor substantively important. For more information, please refer to the WWC Procedures and Standards Handbook (version 3.0), p. 26. 
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Appendix D.1a: Supplemental findings for the mathematics achievement domain, elementary grades 


Mean 


Outcome measure 
Cowan & Goldhaber (2016): 


Sample 
size 


(standard deviation) 


Intervention 


group 


Comparison 


group 


Mean 
difference 


WWC calculations 
Effect 


size 


Improvement 
index 


p-value 


Standardized Math Test All 10,300 0.02 0.00 0.02 0.02 +1 < .01 
students teachers/ (1.02) (1.00) 
742,124 
students 
Standardized Math Test English 10,300 nr nr -0.01 nr nr > JO 
learners teachers/ 
48,631 
students 
Standardized Math Test Special 10,300 nr nr 0.03 nr nr < (i 
education teachers/ 
students 92,937 
students 
Standardized Math Test FRPL 10,300 nr nr 0.01 nr nr > 10 
students teachers/ 
331,924 
students 
Standardized Math Test Students 10,300 nr nr 0.04 nr nr < 05 
in high- teachers/ 
poverty 331,924 
schools students 
Standardized Math Test Teachers 11,050 nr nr 0.02 nr nr <€ 06) 
have MC/ teachers/ 
GEN certi- 127,168 
fications students 
Standardized Math Test Teachers 11,050 nr nr 0.03 nr nr & IO) 
have teachers/ 
EMC/LRA 701,403 
certifica- students 
tions 
Fisher & Dickenson (2005)® 
Palmetto Achievement Grade 4 98 teachers/ 414.88 414.16 0.72 0.05 +2 36 
Challenge Test 666 students (13.30) (13.66) 
Palmetto Achievement Grade5 = 74 teachers/ 511.90 511.29 0.61 0.61 +2 AQ 
Challenge Test 482 students (14.16) (15.08) 
Palmetto Achievement Grade6 = 28 teachers/ 616.58 614.99 1.59 0.10 +4 03 
Challenge Test 546 students (15.40) (15.05) 
Palmetto Achievement Grade 4, 98 teachers/ 409.02 409.13 -0.11 -0.01 0 93 
Challenge Test FRPL 322 students (11.42) (14.25) 
Palmetto Achievement Grade 5, 74 teachers/ 506.01 504.82 1.19 0.09 +4 34 
Challenge Test FRPL 250 students (11.55) (13.52) 
Palmetto Achievement Grade 6, 28 teachers/ 607.50 607.24 0.26 0.02 +1 81 
Challenge Test FRPL 254 students (13.83) (14.51) 
National Board for Professional Teaching Standards Certification February 2018 Page 31 


WWC Intervention Report 


Mean 
(standard deviation) WWC calculations 
Study Sample Intervention Comparison Mean Effect Improvement 
Outcome measure sample size group group difference size riti(ey¢ p-value 
Palmetto Achievement Grade 4, 98 teachers/ 420.36 418.86 1.50 0.13 +5 alo) 
Challenge Test non-FRPL 344 students (12.62) (11.24) 
Palmetto Achievement Grade 5, 74 teachers/ 518.26 518.27 -0.01 —0.00 0 > .99 
Challenge Test non-FRPL 232 students (14.01) (13.54) 
Palmetto Achievement Grade 6, 28 teachers/ 624.49 621.73 2.76 0.23 +9 < 05 
Challenge Test non-FRPL 292 students (11.98) (11.97) 


Table Notes: The supplemental findings presented in this table are additional findings from studies in this report that meet WWC design standards with or without reservations, 
but do not factor into the determination of the intervention rating. For mean difference, effect size, and improvement index values reported in the table, a positive number favors 
the intervention group and a negative number favors the comparison group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing 
the average change expected for all individuals who are given the intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate 
presentation of the effect size, reflecting the change in an average individual’s percentile rank that can be expected if the individual is given the intervention. Some statistics may 
not sum as expected due to rounding. nr = not reported. MC/GEN = Middle Childhood: Generalist certificate. EMC/LRLA = Early and Middle Childhood: Literacy, Reading, and 
Language Arts certificate. FRPL indicates students eligible for free or reduced-price lunch. 


@ For Cowan and Goldhaber (2016), the p-values presented here were reported in the original study. A correction for clustering and for multiple comparisons within the elementary 
school grades was needed and resulted in a WWC-computed critical p-value of .005 for special education students, a WWC-computed critical p-value of .01 for students in high- 
poverty schools, a WWC-computed critical p-value of .02 for students whose teachers had MC/GEN certifications, a WWC-computed critical p-value of .02 for the apparently random 
sample of students whose teachers had EMC/LRA certifications, and a WWC-computed p-value of .03 for the apparently random sample of students; therefore, the WWC does not find 
these results to be statistically significant. Elementary school classrooms included primarily grades 3-5, with some grade 6 students. Apparently random samples refer to subgroups 
of schools where the demographic characteristics of the classrooms are similar to the characteristics of the whole school. High-poverty schools are defined as those eligible for the 
Challenging Schools Bonus, a $5,000 bonus awarded to teachers with NBPTS-certification who work in high-poverty schools. Other certifications include all NBPTS certification areas 
except Middle Childhood: Generalist and Early and Middle Childhood: Literacy, Reading, and Language Arts. All analyses included fixed effects for student cohorts. Cohorts were 
defined by the combination of school, grade, and school year. The number of comparison teachers was estimated by the WWC based on the total number reported by the authors. 


> For Fisher and Dickenson (2005), the p-values presented here were reported in the original study. A correction for clustering and for multiple comparisons within the elementary 


school grades was needed and resulted in a WWC-computed p-value of .09 for grade 6 students not eligible for free/reduced-price lunch; therefore, the WWC does not find the result 
to be statistically significant. The effect size was calculated using the unadjusted mean and standard deviation calculation. 


Appendix D.1b: Description of supplemental findings for the mathematics achievement domain, middle 
school grades 


Mean 
(standard deviation) WWC calculations 
Study Sample Intervention Comparison Mean Effect Improvement 

Outcome measure sample size group group difference size index p-value 
Cowan & Goldhaber (2016)? 
Standardized Math Test All students 4,535 0.03 -0.02 0.05 0.05 +2 < .01 

teachers/ (1.02) (0.99) 

570,533 

students 
Standardized Math Test EL students 4,535 nr nr 0.06 nr nr < {0 

teachers/ 

21,912 

students 

Standardized Math Test FRPL 4,535 nr nr 0.06 nr nr < .01 
students teachers/ 

246,335 

students 
Standardized Math Test Teachers 4,535 nr nr 0.00 nr nr S05 


have other teachers/ 
certification 514,930 
areas students 
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Mean 
(ENCE MC ru) WWC calculations 
Study Sample Intervention Comparison Mean Effect Improvement 
Outcome measure sample size group group difference size index p-value 
Fisher & Dickenson (2005)® 
Palmetto Achievement Grade 7 46 teachers/ 710.81 710.51 0.30 0.02 +1 60 
Challenge Test 962 student (14.64) (13.56) 
Palmetto Achievement Grade 8 42 teachers/ 808.26 807.54 0.72 0.06 +2 Aly 
Challenge Test 680 students (12.87) (12.84) 
Palmetto Achievement Grade 7, 46 teachers/ 705.19 705.79 —0.60 -0.05 -2 50 
Challenge Test FRPL 484 students (12.85) (12.49) 
Palmetto Achievement Grade 8, 42 teachers/ 801.77 801.82 -0.05 -0.01 0 95 
Challenge Test FRPL 284 students (10.44) (10.29) 
Palmetto Achievement Grade 7, 46 teachers/ 716.51 715.28 1.28) 0.09 +4 al 
Challenge Test students 478 students (14.15) (12.97) 
non-FRPL 
Palmetto Achievement Grade 8, 42 teachers/ 812.91 811.65 1.26 0.01 +4 >.05 
Challenge Test non-FRPL 396 students (12.45) (12.94) 


Table Notes: The supplemental findings presented in this table are additional findings from studies in this report that meet WWC design standards with or without reservations, 
but do not factor into the determination of the intervention rating. For mean difference, effect size, and improvement index values reported in the table, a positive number favors 
the intervention group and a negative number favors the comparison group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing 
the average change expected for all individuals who are given the intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate 
presentation of the effect size, reflecting the change in an average individual’s percentile rank that can be expected if the individual is given the intervention. Some statistics may 
not sum as expected due to rounding. nr = not reported. FRPL indicates students eligible for free or reduced-price lunch. EL = English learners. 


@ For Cowan and Goldhaber (2016), a correction for multiple comparisons within the middle school grades was needed but did not affect whether any of the contrasts were found 

to be statistically significant. The p-values presented here were reported in the original study. Middle school classrooms included primarily grades 7-8, with some grade 6 students 
included. Other certifications include all NBPTS certification areas except Early Adolescence: Math. All analyses included fixed effects for student cohorts. Cohorts were defined by 
the combination of school, grade, and school year. The analyses for students in middle school classrooms and students of teachers with other certification areas in middle school 
Classrooms included student cohort-by-track fixed effects. 


> For Fisher and Dickenson (2005), a correction for clustering and for multiple comparisons within the table was needed but did not affect whether any of the contrasts were found to be 
statistically significant. The p-values presented here were reported in the original study. The effect size was calculated using the unadjusted mean and standard deviation calculation. 


Appendix D.1c: Description of supplemental findings for the mathematics achievement domain, by free/ 
reduced-price lunch (FRPL) eligibility in grades 4-8 


Mean 
(standard deviation) WWC calculations 

Study Sample Intervention Comparison Mean Effect Improvement 
Outcome measure sample size group group difference size index p-value 
Fisher & Dickenson (2005)? 
Palmetto Achievement Grades 288 teachers/ 0.00 0.00 0.00 0.00 0 >.99 
Challenge Test 4-8, 1,594 (1.00) (1.00) 

FRPL students 
Palmetto Achievement Grades 4288 0.11 0.00 0.11 0.11 +4 ail 
Challenge Test 4-8, teachers/ (1.00) (1.00) 


non-FRPL 1,742 students 


Table Notes: The supplemental findings presented in this table are additional findings from studies in this report that meet WWC design standards with or without reservations, 
but do not factor into the determination of the intervention rating. For mean difference, effect size, and improvement index values reported in the table, a positive number favors 
the intervention group and a negative number favors the comparison group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing 
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the average change expected for all individuals who are given the intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate 
presentation of the effect size, reflecting the change in an average individual’s percentile rank that can be expected if the individual is given the intervention. Some statistics may 
not sum as expected due to rounding. 


@ For Fisher and Dickenson (2005), a correction for clustering was needed but did not affect whether any of the contrasts were found to be statistically significant. The effect size was 
calculated using the unadjusted mean and standard deviation calculation. The outcomes presented here are based on an aggregated sample of students in grades 4-8 separately 
reported in the original study. Because the outcome measure was not scaled to allow direct comparisons of scores across grades, the WWC standardized the scores and removed 
between-grade variation in the outcome means prior to aggregating across grades. The authors reported p-values for some results, but not for the aggregated analysis. The WWC 
applied a correction for clustering and calculated the p-value reported in the table. 


Appendix D.2a: Description of supplemental findings for the English language arts achievement domain, 
elementary grades 


Mean 
(standard deviation) WWC calculations 
Study Sample Intervention Comparison Mean Effect Improvement 
Outcome measure sample size group group difference size index p-value 
Cowan & Goldhaber (2016)? 
Standardized English All students 10,300 0.02 0.00 0.02 0.02 +1 < .01 
Language Arts Test teachers/ (1.00) (1.00) 
742,124 
students 
Standardized English EL students 10,300 nr nr 0.00 nr nr > (05 
Language Arts Test teachers/ 
48,631 
students 
Standardized English Special 10,300 nr nr 0.02 nr nr ean 05) 
Language Arts Test education teachers/ 
students 92,937 
students 
Standardized English FRPL 10,300 nr nr 0.02 nr nr < .01 
Language Arts Test students teachers/ 
331,924 
students 
Standardized English Students in 10,300 nr nr 0.02 nr nr < A 
Language Arts Test high-poverty teachers/ 
schools 105,091 
students 
Standardized English Teachers 10,300 nr nr 0.01 nr nr e205) 
Language Arts Test have MC/ teachers/ 
GEN 727,768 
certifications students 
Standardized English Teachers 10,300 nr nr 0.03 nr nr > (0 
Language Arts Test have other teachers/ 
certifications 696,335 
students 
Fisher & Dickenson (2005)° 
Palmetto Achievement Grade 4 100 teachers/ 409.20 407.32 1.88 0.16 +/ 01 
Challenge Test 410 students (11.24) (11.61) 
Palmetto Achievement Grade 5 78 teachers/ 503.83 502.51 1.82 0.12 +5 .08 
Challenge Test 374 students (11.67) (9.76) 
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Mean 
(standard deviation) WWC calculations 
Study Sample Intervention Comparison Mean Effect Improvement 

Outcome measure sample size group group difference size index p-value 
Palmetto Achievement Grade 6 48 teachers/ 605.78 606.31 -0.53 -0.04 -1 43 
Challenge Test 848 students (14.21) (14.16) 

Palmetto Achievement Grade 4, 100 teachers/ 403.31 401.94 1 0.13 +5 122. 
Challenge Test FRPL 188 students (10.58) (10.96) 

Palmetto Achievement Grade 5, 78 teachers/ 498.70 497.76 0.94 0.09 +4 A6 
Challenge Test FRPL 178 students (11.18) (8.99) 

Palmetto Achievement Grade 6, 48 teachers/ 599.80 600.19 -0.39 -0.03 —| 10 
Challenge Test FRPL 354 students (14.06) (12.67) 

Palmetto Achievement Grade 4, 100 teachers/ 414,20 411.88 De 0.33 +9 02 
Challenge Test non-FRPL 222 students (9.21) (10.13) 

Palmetto Achievement Grade 5, 78 teachers/ 508.49 506.82 1.67 0.18 +7 04 
Challenge Test non-FRPL 196 students (10.08) (8.36) 

Palmetto Achievement Grade 6, 48 teachers/ 610.07 610.69 -0.62 -0.05 -2 A7 
Challenge Test non-FRPL 494 students (12.71) (13.56) 


Table Notes: The supplemental findings presented in this table are additional findings from studies in this report that meet WWC design standards with or without reservations, 
but do not factor into the determination of the intervention rating. For mean difference, effect size, and improvement index values reported in the table, a positive number favors 
the intervention group and a negative number favors the comparison group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing 
the average change expected for all individuals who are given the intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate 
presentation of the effect size, reflecting the change in an average individual’s percentile rank that can be expected if the individual is given the intervention. Some statistics may 
not sum as expected due to rounding. nr = not reported. MC/GEN = Middle Childhood: Generalist certificate. FRPL indicates students eligible for free or reduced-price lunch. EL = 
English Learners. 


@ For Cowan and Goldhaber (2016), the p-values presented here were reported in the original study. A correction for clustering and for multiple comparisons within the elementary school 
grades was needed and resulted in a WWC-computed p-value of .05 for special education students; therefore, the WWC does not find the result to be statistically significant. Elementary 
school classrooms included primarily grades 3-5, with some grade 6 students. Apparently random samples refer to subgroups of schools where the demographic characteristics of 

the classrooms are similar to the characteristics of the whole school. High-poverty schools are defined as those eligible for the Challenging Schools Bonus, a $5,000 bonus awarded to 
teachers with NBPTS certification who work in high-poverty schools. Other certifications include all NBPTS certification areas except Middle Childhood: Generalist and Early and Middle 
Childhood: Literacy, Reading, and Language Arts. All analyses included fixed effects for student cohorts. Cohorts were defined by the combination of school, grade, and school year. 


> For Fisher and Dickenson (2005), the p-values presented here were reported in the original study. A correction for clustering and for multiple comparisons within the elementary 
school grades was needed and resulted in a WWC-computed critical p-value of .006 for grade 4 students and a WWC-computed critical p-value of .011 for grade 4 students not 
eligible for free/reduced-price lunch; therefore, the WWC does not find the results for either outcome to be statistically significant. A correction for clustering was needed and resulted 
in a WWC-computed p-value of .07 for grade 5 students not eligible for free/reduced-price lunch; therefore, the WWC does not find the result to be statistically significant. 


Appendix D.2b: Description of supplemental findings for the English language arts achievement domain, 
middle school grades 


Mean 
(standard deviation) WWC calculations 
Sample Intervention Comparison Mean Effect Improvement 
Outcome measure size group group difference _— size index p-value 
Cowan & Goldhaber (2016)? 
Standardized English Allstudents 5,811 teachers/ 0.05 0.04 0.01 0.01 +1 <.01 
Language Arts Test 492,800 students (0.95) (0.97) 
Standardized English EL students 5,811 teachers/ nr nr 0.03 nr nr >.05 
Language Arts Test 15,212 students 
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Mean 
(standard deviation) WWC calculations 
Sample Intervention Comparison Mean Effect Improvement 

Outcome measure size group group difference —_— size index p-value 
Standardized English FRPL 5,811 teachers/ nr nr 0.01 nr nr S 05 
Language Arts Test students 210,254 students 
Standardized English Students 5,811 teachers/ nr nr 0.02 nr nr S 05 
Language Arts Test in high- 107,646 students 

poverty 

schools 
Standardized English Teachers 5,811 teachers/ nr nr 0.01 nr nr < {05 
Language Arts Test have 473,693 students 

EA/ELA 

certifications 
Standardized English Teachers 5,811 teachers/ nr nr 0.01 nr nr < 105) 
Language Arts Test have other 442,333 students 
certifications 

Fisher & Dickenson (2005)* 
Palmetto Achievement Grade 7 68 teachers/ 705.71 704.05 1.66 0.15 +6 < .01 
Challenge Test 898 students (11.59) (10.80) 
Palmetto Achievement Grade 8 80 teachers/ 806.58 805.27 ileal 0.12 +5 < .01 
Challenge Test 1,408 students (11.18) (11.17) 
Palmetto Achievement Grade 7, 68 teachers/ 700.60 700.37 O23 0.02 +H AS 
Challenge Test FRPL 438 students (9.81) (9.44) 
Palmetto Achievement Grade 8, 80 teachers/ 802.28 800.20 2.08 0.20 +8 < .01 
Challenge Test FRPL 644 students (10.42) (9.93) 
Palmetto Achievement Grade 7, 68 teachers/ 710.57 707.55 3.02 0.28 +11 < Mi 
Challenge Test non-FRPL 460 students (11.07) (10.86) 
Palmetto Achievement Grade 8, 80 teachers/ 810.20 809.53 0.67 0.06 +3 .20 
Challenge Test non-FRPL 764 students (10.50) (10.38) 


Table Notes: The supplemental findings presented in this table are additional findings from studies in this report that meet WWC design standards with or without reservations, but do not factor 
into the determination of the intervention rating. For mean difference, effect size, and improvement index values reported in the table, a positive number favors the intervention group and a 
negative number favors the comparison group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing the average change expected for all individu- 
als who are given the intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate presentation of the effect size, reflecting the change in an 
average individual's percentile rank that can be expected if the individual is given the intervention. Some statistics may not sum as expected due to rounding. nr = not reported. EA/ELA = Early 
Adolescence: English Language Arts certificate. 


@ For Cowan and Goldhaber (2016), a correction for multiple comparisons and for multiple comparisons within the middle school grades was needed but did not affect whether any of 
the contrasts were found to be statistically significant. The p-values presented here were reported in the original study. Middle school classrooms included primarily grades 7-8, with 
some grade 6 students included. High-poverty schools are defined as those eligible for the Challenging Schools Bonus, a $5,000 bonus awarded to teachers with NBPTS certification 
who work in high-poverty schools. Other certifications include all NBPTS certification areas except Early Adolescence: English Language Arts. All analyses included fixed effects for 
student cohorts. Cohorts were defined by the combination of school, grade, and school year. The analyses for students in middle school classrooms, students of teachers with EA/ 
ELA certifications in middle school classrooms, and students of teachers with other certification areas in middle school classrooms included cohort-by-track fixed effects. 


> For Fisher and Dickenson (2005), the p-values presented here were reported in the original study. A correction for clustering and for multiple comparisons within the middle school 
grades was needed and resulted in a WWC-computed p-value of .08 for grade 7 students, .09 for grade 8 students; therefore, the WWC does not find the results to be statistically 
significant. A correction for clustering and multiple comparisons was needed and resulted in a WWC-computed critical p-value of .008 for grade 8 students eligible for free/reduced- 
price lunch and a WWC-computed critical p-value of .008 for grade 7 students not eligible for free/reduced-price lunch; therefore, the WWC does not find the result for either outcome 
to be statistically significant. 
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Appendix D.2c: Description of supplemental findings for the English language arts achievement domain, by 
free/reduced-price lunch (FRPL) eligibility in grades 4-8 


Mean 
(ENCE MCE) WWC calculations 
Study ETH (3) Intervention Comparison Mean Effect Improvement 

Outcome measure sample size group group difference size index p-value 
Fisher & Dickenson (2005)? 
Palmetto Achievement FRPL 374 teachers/ 0.10 0.00 0.10 0.10 +4 A 
Challenge Test students 1,802 (1.00) (1.00) 

students 
Palmetto Achievement Non-FRPL 374 teachers/ 0.11 0.00 0.11 0.11 +4 07 
Challenge Test students Palate (1.01) (1.00) 

students 


Table Notes: The supplemental findings presented in this table are additional findings from studies in this report that meet WWC design standards with or without reservations, 
but do not factor into the determination of the intervention rating. For mean difference, effect size, and improvement index values reported in the table, a positive number favors 
the intervention group and a negative number favors the comparison group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing 
the average change expected for all individuals who are given the intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate 
presentation of the effect size, reflecting the change in an average individual’s percentile rank that can be expected if the individual is given the intervention. Some statistics may 
not sum as expected due to rounding. 


@ For Fisher and Dickenson (2005), a correction for clustering was needed but did not affect whether any of the contrasts were found to be statistically significant. The effect size was 
calculated using the unadjusted mean and standard deviation calculation. The outcomes presented here are based on an aggregated sample of students in grades 4-8 that were 
separately reported in the original study. Because the outcome measure was not scaled to allow direct comparisons of scores across grades, the WWC standardized the scores and 
removed between-grade variation in the outcome means prior to aggregating across grades. The authors reported p-values for some results, but not for the aggregated analysis. The 
WWC applied a correction for clustering and calculated the p-value reported in the table. 
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Endnotes 


1 The descriptive information for this intervention comes from publicly available sources, specifically intervention websites (http://www. 
nbpts.org/ and http://www.boardcertifiedteachers.org/, downloaded April 2017). The What Works Clearinghouse (WWC) requests 
developers review the intervention description sections for accuracy from their perspective. The WWC provided the developer with the 
intervention description in April 2017, and the WWC incorporated feedback from the developer. Further verification of the accuracy of 
the descriptive information for this intervention is beyond the scope of this review. 


? The maximum amount of time and the requirements to achieve NBPTS certification have varied over time. 


5 The literature search reflects documents publicly available by March 2017. Reviews of the studies in this report used the standards 
from the WWC Procedures and Standards Handbook (version 3.0) and the Teacher Training, Evaluation, and Compensation (TTEC) 
review protocol (version 3.2). The evidence presented in this report is based on available research. Findings and conclusions may 
change as new research becomes available. The WWC released a single study review of Goldhaber and Anthony (2007) in 2016. This 
study was previously reviewed in a grant competition in 2016 and was rated as meets standards with reservations. The study was 
reviewed again under the TTEC protocol for this product and was rated does not meet standards. The difference was based on the 
grant competition rating a contrast that met standards that is not eligible for the TTEC protocol: comparing newly-certified teachers 
with teachers who failed certification. In consultation with the TTEC area content experts, we determined this contrast was out of the 
scope of this review, as the comparison teachers had received some portions of the intervention, and therefore did not represent an 
untreated condition. 


4 Studies included different locations. Cowan and Goldhaber (2016) included all school districts in Washington state; Fisher and Dick- 
enson (2005) included all school districts in South Carolina; Gardner (2010) included the Brevard County and Seminole County Public 
School Districts in Florida; Silver (2007) included all school districts in North Carolina; and Stephens (2003) included two counties in 
South Carolina. Stephens (2003) did not name the included counties. 


5 Please see the Teacher Training, Evaluation, and Compensation review protocol (version 3.2) for a list of all outcome domains. 


® For criteria used to determine the rating of effectiveness and extent of evidence, see the WWC Rating Criteria on p. 42. These 
improvement index numbers show the average and range of individual-level improvement indices for all findings across the studies. 


’ The study did not report the number of students taught by the teachers, and the author did not respond to an author query. 


8 The WWC identified one additional source related to Cowan and Goldhaber (2016). The study does not contribute unique information 
to Appendix A.1 and is not listed here. 


° Weighted averages for each demographic were calculated by weighting the elementary and middle school demographic characteris- 
tics by their share of the total student sample examined in the study. 


1° The study also examined the effect of subgroups of teachers on student mathematics and English language arts achievement based 
on whether the teacher passed NBPTS certification on the first or second attempt, and their scores for each attempt; these contrasts 
are ineligible for review because they do not focus on a subgroup of interest in the Teacher Training, Evaluation, and Compensation 
review protocol. 


11 Fisher and Dickenson (2005) also examined outcomes using hierarchical linear models and what the authors refer to as a “pilot 
analysis,” which included all teachers and students observed without any matching to balance baseline achievement; these contrasts 
do not meet WWC group design standards because equivalence of the analytic intervention and comparison groups is necessary and 
not demonstrated. 


12 The study examined outcomes in both the 2003-04 and 2004-05 school years. However, the WWC review focused only on the 
outcomes measured in the 2004-05 school year, as all intervention teachers were fully NBPTS-certified at the beginning of this school 
year. These teachers were still in the certification process at the beginning of the 2003-04 school year, and therefore, students in the 
intervention condition did not receive a full year of instruction from NBPTS-certified teachers. In addition, the WWC used the 2002-03 
school year as the baseline for assessing equivalence of the intervention and comparison conditions, for the same reason. 


18 Cowan and Goldhaber also present several mathematics and English language arts impact estimates among a subgroup of elemen- 
tary school students they refer to as an “apparently random sample.” This subgroup was identified by limiting to students whose 
classroom demographic characteristics were similar to the school-level demographics. In other words, there was no evidence of 
student sorting by classrooms. These findings were generally of the same magnitude as those using the full sample of students, but 
most were not statistically significant. 


14 Cowan and Goldhaber also present several mathematics and English language arts impact estimates among middle school stu- 
dents using cohort-by-track fixed effects. These findings did not differ from the analyses of the same outcome using only cohort fixed 
effects. 
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WWC Rating Criteria 
Criteria used to determine the rating of a study 


Study rating Criteria 

Meets WWC group design A study that provides strong evidence for an intervention’s effectiveness, such as a well-implemented RCT. 
standards without reservations 

Meets WWC group design A study that provides weaker evidence for an intervention’s effectiveness, such as a QED or an RCT with high attri- 
standards with reservations tion that has established equivalence of the analytic samples. 


Criteria used to determine the rating of effectiveness for an intervention 


Rating of effectiveness Criteria 


Positive effects Two or more studies show statistically significant positive effects, at least one of which met WWC group design 
standards without reservations, AND 
No studies show statistically significant or substantively important negative effects. 


Potentially positive effects At least one study shows a statistically significant or substantively important positive effect, AND 
No studies show a statistically significant or substantively important negative effect AND fewer or the same number 
of studies show indeterminate effects than show statistically significant or substantively important positive effects. 


Mixed effects At least one study shows a statistically significant or substantively important positive effect AND at least one study 
shows a statistically significant or substantively important negative effect, but no more such studies than the number 
showing a statistically significant or substantively important positive effect, OR 
At least one study shows a statistically significant or substantively important effect AND more studies show an 
indeterminate effect than show a statistically significant or substantively important effect. 


Potentially negative effects One study shows a statistically significant or substantively important negative effect and no studies show a statisti- 
cally significant or substantively important positive effect, OR 
Two or more studies show statistically significant or substantively important negative effects, at least one study 
shows a statistically significant or substantively important positive effect, and more studies show statistically 
significant or substantively important negative effects than show statistically significant or substantively important 
positive effects. 


Negative effects Two or more studies show statistically significant negative effects, at least one of which met WWC group design 
standards without reservations, AND 
No studies show statistically significant or substantively important positive effects. 


No discernible effects None of the studies shows a statistically significant or substantively important effect, either positive or negative. 


Criteria used to determine the extent of evidence for an intervention 


Extent of evidence Criteria 


Medium to large The domain includes more than one study, AND 
The domain includes more than one school, AND 
The domain findings are based on a total sample size of at least 350 students, OR, assuming 25 students in a class, 
a total of at least 14 classrooms across studies. 


Small The domain includes only one study, OR 
The domain includes only one school, OR 
The domain findings are based on a total sample size of fewer than 350 students, AND, assuming 25 students in a 
Class, a total of fewer than 14 classrooms across studies. 
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Glossary of Terms 


Attrition 


Baseline 


Clustering adjustment 


Confounding factor 


Design 


Effect size 


Attrition occurs when an outcome variable is not available for all subjects initially assigned 
to the intervention and comparison groups. If a randomized controlled trial (RCT) or regres- 
sion discontinuity design (RDD) study has high levels of attrition, the validity of the study 
results can be called into question. An RCT with high attrition cannot receive the highest 
rating of Meets WWC Group Design Standards without Reservations, but can receive a 
rating of Meets WWC Group Design Standards with Reservations if it establishes baseline 
equivalence of the analytic sample. Similarly, the highest rating an RDD with high attrition 
can receive is Meets WWC RDD Standards with Reservations. 


For single-case design research, attrition occurs when an individual fails to complete all 
required phases or data points in an experiment, or when the case is a group and indi- 
viduals leave the group. If a single-case design does not meet minimum requirements for 
phases and data points within phases, the study cannot receive the highest rating of Meets 
WWC Pilot Single-Case Design Standards without Reservations. 


A point in time before the intervention was implemented in group design research and in 
regression discontinuity design studies. When a study is required to satisfy the baseline 
equivalence requirement, it must be done with characteristics of the analytic sample at 
baseline. In a single-case design experiment, the baseline condition is a period during 
which participants are not receiving the intervention. 


An adjustment to the statistical significance of a finding when the units of assignment 

and analysis differ. When random assignment is carried out at the cluster level, outcomes 
for individual units within the same clusters may be correlated. When the analysis is con- 
ducted at the individual level rather than the cluster level, there is a mismatch between 
the unit of assignment and the unit of analysis, and this correlation must be accounted for 
when assessing the statistical significance of an impact estimate. If the correlation is not 
accounted for in a mismatched analysis, the study may be too likely to report statistically 
significant findings. To fairly assess an intervention’s effects, in cases where study authors 
have not corrected for the clustering, the WWC applies an adjustment for clustering when 
reporting statistical significance. 


A confounding factor is a component of a study that is completely aligned with one of the 
study conditions, making it impossible to separate how much of the observed effect was 
due to the intervention and how much was due to the factor. 


The method by which intervention and comparison groups are assigned (group design and 
regression discontinuity design) or the method by which an outcome measure is assessed 
repeatedly within and across different phases that are defined by the presence or absence 
of an intervention (single-case design). Designs eligible for WWC review are randomized 
controlled trials, quasi-experimental designs, regression discontinuity designs, and single- 
case designs. 


The effect size is a measure of the magnitude of an effect. The WWC uses a standardized 
measure to facilitate comparisons across studies and outcomes. 


Eligibility A study is eligible for review and inclusion in this report if it falls within the scope of the 


review protocol and uses either an experimental or matched comparison group design. 
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Extent of evidence 


Gain scores 


Group design 


Improvement index 


Intervention 


Intervention report 


Multiple comparison 
adjustment 


An indication of how much evidence from group design studies supports the findings in an 
intervention report. The extent of evidence categorization for intervention reports focuses 
on the number and sizes of studies of the intervention in order to give an indication of how 
broadly findings may be applied to different settings. There are two extent of evidence cat- 
egories: small and medium to large. 


small: includes only one study, or one school, or findings based on a total sample size of 
less than 350 students and 14 classrooms (assuming 25 students in a class) 


medium to large: includes more than one study, more than one school, and findings based 
on a total sample of at least 350 students or 14 classrooms 


The result of subtracting the pretest from the posttest for each individual in the sample. 
Some studies analyze gain scores instead of the unadjusted outcome measure as a method 
of accounting for the baseline measure when estimating the effect of an intervention. The 
WWC reviews and reports findings from analyses of gain scores, but gain scores do not 
satisfy the WWC’s requirement for a statistical adjustment under the baseline equivalence 
requirement. This means that a study that must satisfy the baseline equivalence require- 
ment and has baseline differences between 0.05 and 0.25 standard deviations Does Not 
Meet WWC Group Design Standards if the study’s only adjustment for the baseline measure 
was in the construction of the gain score. 


A study design in which outcomes for a group receiving an intervention are compared to 
those for a group not receiving the intervention. Comparison group designs eligible for 
WWC review are randomized controlled trials and quasi-experimental designs. 


Along a percentile distribution of individuals, the improvement index represents the gain or 
loss of the average individual due to the intervention. As the average individual starts at the 
50th percentile, the measure ranges from —50 to +50. 


An educational program, product, practice, or policy aimed at improving student outcomes. 


A summary of the findings of the highest-quality research on a given program, product, 
practice, or policy in education. The WWC searches for all research studies on an interven- 
tion, reviews each against design standards, and summarizes the findings of those that 
meet WWC design standards. 


An adjustment to the statistical significance of results to account for multiple comparisons 
in a group design study. The WWC uses the Benjamini-Hochberg (BH) correction to adjust 
the statistical significance of results within an outcome domain when study authors perform 
multiple hypothesis tests without adjusting the p-value. The BH correction is used in three 
types of situations: studies that tested multiple outcome measures in the same outcome 
domain with a single comparison group; studies that tested a given outcome measure 
with multiple comparison groups; and studies that tested multiple outcome measures in 
the same outcome domain with multiple comparison groups. Because repeated tests of 
highly correlated constructs will lead to a greater likelihood of mistakenly concluding that 
the impact was different from zero, in all three situations, the WWC uses the BH correction 
to reduce the possibility of making this error. The WWC makes separate adjustments for 
primary and secondary findings. 
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Outcome domain 
Quasi-experimental 
design (QED) 


Randomized controlled 
trial (RCT) 


Rating of effectiveness 


Regression discontinuity 
design (RDD) 


Single-case design 


Standard deviation 


Statistical significance 


Study rating 


Substantively important 


Systematic review 


A group of closely-related outcomes. A domain is the organizing construct for a set of 
related outcomes through which studies claim effectiveness. 


A quasi-experimental design (QED) is a research design in which study participants are 
assigned to intervention and comparison groups through a process that is not random. 


A randomized controlled trial (RCT) is an experiment in which eligible study participants are 
randomly assigned to intervention and comparison groups. 


For group design research, the WWC rates the effectiveness of an intervention in each 
domain based on the quality of the research design and the magnitude, statistical signifi- 
cance, and consistency in findings. For single-case design research, the WWC rates the 
effectiveness of an intervention in each domain based on the quality of the research design 
and the consistency of demonstrated effects. The criteria for the ratings of effectiveness are 
given in the WWC Rating Criteria on p. 41. 


A design in which groups are created using a continuous scoring rule. For example, stu- 
dents may be assigned to a summer school program if they score below a preset point on a 
standardized test, or schools may be awarded a grant based on their score on an applica- 
tion. A regression line or curve is estimated for the intervention group and similarly for the 
comparison group, and an effect occurs if there is a discontinuity in the two regression lines 
at the cutoff. 


A research approach in which an outcome variable is measured repeatedly within and 
across different conditions that are defined by the presence or absence of an intervention. 


The standard deviation of a measure shows how much variation exists across observations 
in the sample. A low standard deviation indicates that the observations in the sample tend 
to be very close to the mean; a high standard deviation indicates that the observations in 
the sample tend to be spread out over a large range of values. 


Statistical significance is the probability that the difference between groups is a result of 
chance rather than a real difference between the groups. The WWC labels a finding statisti- 
cally significant if the likelihood that the difference is due to chance is less than 5% (p < .05). 


The result of the WWC assessment of a study. The rating is based on the strength of the 
evidence of the effectiveness of the educational intervention. Studies are given a rating of 
Meets WWC Design Standards without Reservations, Meets WWC Design Standards with 
Reservations, or Does Not Meet WWC Design Standards, based on the assessment of the 
study against the appropriate design standards. The WWC has design standards for group 
design, single-case design, and regression discontinuity design studies. 


A substantively important finding is one that has an effect size of 0.25 or greater, regardless 
of statistical significance. 


A review of existing literature on a topic that is identified and reviewed using explicit meth- 
ods. AWWC systematic review has five steps: 1) developing a review protocol; 2) searching 
the literature; 3) reviewing studies, including screening studies for eligibility, reviewing the 
methodological quality of each study, and reporting on high quality studies and their find- 
ings; 4) combining findings within and across studies; and, 5) summarizing the review. 


Please see the WWC Procedures and Standards Handbook (version 3.0) for additional details. 


National Board for Professional Teaching Standards Certification February 2018 Page 43 


WWC Intervention Report 


Le Le 


Intervention Practice Quick Single Study 
Report Guide Review Review 


An intervention report summarizes the findings of high-quality research on a given program, practice, or policy in 
education. The WWC searches for all research studies on an intervention, reviews each against evidence standards, 
and summarizes the findings of those that meet standards. 


This intervention report was prepared for the WWC by Mathematica Policy Research under contract ED-IES-13-C-0010. 
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